Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 19 de 19
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Sensors (Basel) ; 22(16)2022 Aug 17.
Artigo em Inglês | MEDLINE | ID: mdl-36015930

RESUMO

The rapid growth of digital information has produced massive amounts of time series data on rich features and most time series data are noisy and contain some outlier samples, which leads to a decline in the clustering effect. To efficiently discover the hidden statistical information about the data, a fast weighted fuzzy C-medoids clustering algorithm based on P-splines (PS-WFCMdd) is proposed for time series datasets in this study. Specifically, the P-spline method is used to fit the functional data related to the original time series data, and the obtained smooth-fitting data is used as the input of the clustering algorithm to enhance the ability to process the data set during the clustering process. Then, we define a new weighted method to further avoid the influence of outlier sample points in the weighted fuzzy C-medoids clustering process, to improve the robustness of our algorithm. We propose using the third version of mueen's algorithm for similarity search (MASS 3) to measure the similarity between time series quickly and accurately, to further improve the clustering efficiency. Our new algorithm is compared with several other time series clustering algorithms, and the performance of the algorithm is evaluated experimentally on different types of time series examples. The experimental results show that our new method can speed up data processing and the comprehensive performance of each clustering evaluation index are relatively good.


Assuntos
Algoritmos , Lógica Fuzzy , Análise por Conglomerados , Fatores de Tempo
2.
Entropy (Basel) ; 23(6)2021 Jun 02.
Artigo em Inglês | MEDLINE | ID: mdl-34199499

RESUMO

Feature selection is one of the core contents of rough set theory and application. Since the reduction ability and classification performance of many feature selection algorithms based on rough set theory and its extensions are not ideal, this paper proposes a feature selection algorithm that combines the information theory view and algebraic view in the neighborhood decision system. First, the neighborhood relationship in the neighborhood rough set model is used to retain the classification information of continuous data, to study some uncertainty measures of neighborhood information entropy. Second, to fully reflect the decision ability and classification performance of the neighborhood system, the neighborhood credibility and neighborhood coverage are defined and introduced into the neighborhood joint entropy. Third, a feature selection algorithm based on neighborhood joint entropy is designed, which improves the disadvantage that most feature selection algorithms only consider information theory definition or algebraic definition. Finally, experiments and statistical analyses on nine data sets prove that the algorithm can effectively select the optimal feature subset, and the selection result can maintain or improve the classification performance of the data set.

3.
Entropy (Basel) ; 23(2)2021 Jan 27.
Artigo em Inglês | MEDLINE | ID: mdl-33514041

RESUMO

Traditional image denoising algorithms obtain prior information from noisy images that are directly based on low rank matrix restoration, which pays little attention to the nonlocal self-similarity errors between clear images and noisy images. This paper proposes a new image denoising algorithm based on low rank matrix restoration in order to solve this problem. The proposed algorithm introduces the non-local self-similarity error between the clear image and noisy image into the weighted Schatten p-norm minimization model using the non-local self-similarity of the image. In addition, the low rank error is constrained by using Schatten p-norm to obtain a better low rank matrix in order to improve the performance of the image denoising algorithm. The results demonstrate that, on the classic data set, when comparing with block matching 3D filtering (BM3D), weighted nuclear norm minimization (WNNM), weighted Schatten p-norm minimization (WSNM), and FFDNet, the proposed algorithm achieves a higher peak signal-to-noise ratio, better denoising effect, and visual effects with improved robustness and generalization.

4.
Artigo em Inglês | MEDLINE | ID: mdl-32046048

RESUMO

BACKGROUND: Hepatocellular carcinoma (HCC) is a major threat to public health. However, few effective therapeutic strategies exist. We aimed to identify potentially therapeutic target genes of HCC by analyzing three gene expression profiles. METHODS: The gene expression profiles were analyzed with GEO2R, an interactive web tool for gene differential expression analysis, to identify common differentially expressed genes (DEGs). Functional enrichment analyses were then conducted followed by a protein-protein interaction (PPI) network construction with the common DEGs. The PPI network was employed to identify hub genes, and the expression level of the hub genes was validated via data mining the Oncomine database. Survival analysis was carried out to assess the prognosis of hub genes in HCC patients. RESULTS: A total of 51 common up-regulated DEGs and 201 down-regulated DEGs were obtained after gene differential expression analysis of the profiles. Functional enrichment analyses indicated that these common DEGs are linked to a series of cancer events. We finally identified 10 hub genes, six of which (OIP5, ASPM, NUSAP1, UBE2C, CCNA2, and KIF20A) are reported as novel HCC hub genes. Data mining the Oncomine database validated that the hub genes have a significant high level of expression in HCC samples compared normal samples (t-test, p < 0.05). Survival analysis indicated that overexpression of the hub genes is associated with a significant reduction (p < 0.05) in survival time in HCC patients. CONCLUSIONS: We identified six novel HCC hub genes that might be therapeutic targets for the development of drugs for some HCC patients.


Assuntos
Carcinoma Hepatocelular , Regulação Neoplásica da Expressão Gênica , Neoplasias Hepáticas , Carcinoma Hepatocelular/genética , Carcinoma Hepatocelular/terapia , Biologia Computacional , Perfilação da Expressão Gênica , Genes Neoplásicos , Humanos , Neoplasias Hepáticas/genética , Neoplasias Hepáticas/terapia , Terapia de Alvo Molecular , Prognóstico
5.
Artigo em Inglês | MEDLINE | ID: mdl-31936708

RESUMO

Dengue fever (DF) is one of the most rapidly spreading diseases in the world, and accurate forecasts of dengue in a timely manner might help local government implement effective control measures. To obtain the accurate forecasting of DF cases, it is crucial to model the long-term dependency in time series data, which is difficult for a typical machine learning method. This study aimed to develop a timely accurate forecasting model of dengue based on long short-term memory (LSTM) recurrent neural networks while only considering monthly dengue cases and climate factors. The performance of LSTM models was compared with the other previously published models when predicting DF cases one month into the future. Our results showed that the LSTM model reduced the average the root mean squared error (RMSE) of the predictions by 12.99% to 24.91% and reduced the average RMSE of the predictions in the outbreak period by 15.09% to 26.82% as compared with other candidate models. The LSTM model achieved superior performance in predicting dengue cases as compared with other previously published forecasting models. Moreover, transfer learning (TL) can improve the generalization ability of the model in areas with fewer dengue incidences. The findings provide a more precise forecasting dengue model and could be used for other dengue-like infectious diseases.


Assuntos
Aprendizado Profundo , Dengue/epidemiologia , Previsões , Cidades/epidemiologia , Surtos de Doenças , Humanos , Incidência , Redes Neurais de Computação
6.
Sci Rep ; 9(1): 17283, 2019 11 21.
Artigo em Inglês | MEDLINE | ID: mdl-31754223

RESUMO

This study aimed to select the feature genes of hepatocellular carcinoma (HCC) with the Fisher score algorithm and to identify hub genes with the Maximal Clique Centrality (MCC) algorithm. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis was performed to examine the enrichment of terms. Gene set enrichment analysis (GSEA) was used to identify the classes of genes that are overrepresented. Following the construction of a protein-protein interaction network with the feature genes, hub genes were identified with the MCC algorithm. The Kaplan-Meier plotter was utilized to assess the prognosis of patients based on expression of the hub genes. The feature genes were closely associated with cancer and the cell cycle, as revealed by GO, KEGG and GSEA enrichment analyses. Survival analysis showed that the overexpression of the Fisher score-selected hub genes was associated with decreased survival time (P < 0.05). Weighted gene co-expression network analysis (WGCNA), Lasso, ReliefF and random forest were used for comparison with the Fisher score algorithm. The comparison among these approaches showed that the Fisher score algorithm is superior to the Lasso and ReliefF algorithms in terms of hub gene identification and has similar performance to the WGCNA and random forest algorithms. Our results demonstrated that the Fisher score followed by the application of the MCC algorithm can accurately identify hub genes in HCC.


Assuntos
Algoritmos , Biomarcadores Tumorais/genética , Carcinoma Hepatocelular/genética , Neoplasias Hepáticas/genética , Carcinoma Hepatocelular/diagnóstico , Carcinoma Hepatocelular/mortalidade , Carcinoma Hepatocelular/patologia , Biologia Computacional/métodos , Conjuntos de Dados como Assunto , Perfilação da Expressão Gênica/métodos , Regulação Neoplásica da Expressão Gênica , Redes Reguladoras de Genes , Humanos , Fígado/patologia , Neoplasias Hepáticas/diagnóstico , Neoplasias Hepáticas/mortalidade , Neoplasias Hepáticas/patologia , Análise de Sequência com Séries de Oligonucleotídeos , Prognóstico , Mapeamento de Interação de Proteínas , Mapas de Interação de Proteínas/genética , Análise de Sobrevida
7.
Sci Rep ; 9(1): 8978, 2019 06 20.
Artigo em Inglês | MEDLINE | ID: mdl-31222027

RESUMO

For the DNA microarray datasets, tumor classification based on gene expression profiles has drawn great attention, and gene selection plays a significant role in improving the classification performance of microarray data. In this study, an effective hybrid gene selection method based on ReliefF and Ant colony optimization (ACO) algorithm for tumor classification is proposed. First, for the ReliefF algorithm, the average distance among k nearest or k non-nearest neighbor samples are introduced to estimate the difference among samples, based on which the distances between the samples in the same class or the different classes are defined, and then it can more effectively evaluate the weight values of genes for samples. To obtain the stable results in emergencies, a distance coefficient is developed to construct a new formula of updating weight coefficient of genes to further reduce the instability during calculations. When decreasing the distance between the same samples and increasing the distance between the different samples, the weight division is more obvious. Thus, the ReliefF algorithm can be improved to reduce the initial dimensionality of gene expression datasets and obtain a candidate gene subset. Second, a new pruning rule is designed to reduce dimensionality and obtain a new candidate subset with the smaller number of genes. The probability formula of the next point in the path selected by the ants is presented to highlight the closeness of the correlation relationship between the reaction variables. To increase the pheromone concentration of important genes, a new phenotype updating formula of the ACO algorithm is adopted to prevent the pheromone left by the ants that are overwhelmed with time, and then the weight coefficients of the genes are applied here to eliminate the interference of difference data as much as possible. It follows that the improved ACO algorithm has the ability of the strong positive feedback, which quickly converges to an optimal solution through the accumulation and the updating of pheromone. Finally, by combining the improved ReliefF algorithm and the improved ACO method, a hybrid filter-wrapper-based gene selection algorithm called as RFACO-GS is proposed. The experimental results under several public gene expression datasets demonstrate that the proposed method is very effective, which can significantly reduce the dimensionality of gene expression datasets, and select the most relevant genes with high classification accuracy.


Assuntos
Algoritmos , Biomarcadores Tumorais , Biologia Computacional/métodos , Neoplasias/diagnóstico , Neoplasias/genética , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Proteínas de Fusão Oncogênica/genética , Biologia Computacional/normas , Humanos , Análise de Sequência com Séries de Oligonucleotídeos/normas , Reprodutibilidade dos Testes
8.
Comput Math Methods Med ; 2019: 6705648, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30809269

RESUMO

To select more effective feature genes, many existing algorithms focus on the selection and study of evaluation methods for feature genes, ignoring the accurate mapping of original information in data processing. Therefore, for solving this problem, a new model is proposed in this paper: rough uncertainty metric model. First, the fuzzy neighborhood granule of the sample is constructed by combining the fuzzy similarity relation with the neighborhood radius in the rough set, and the rough decision is defined by using the fuzzy similarity relation and the decision equivalence class. Then, the fuzzy neighborhood granule and the rough decision are introduced into the conditional entropy, and the rough uncertainty metric model is proposed; in the meantime, the definition of measuring the significance of feature genes and the proof of some related theorems are given. To make this model tolerate noises in data, this paper introduces a variable precision model and discusses the selection of parameters. Finally, based on the rough uncertainty metric model, we design a feature genes selection algorithm and compare it with some existing similar algorithms. The experimental results show that the proposed algorithm can select the smaller feature genes subset with higher classification accuracy and verify that the model proposed in this paper is more effective.


Assuntos
Algoritmos , Diagnóstico por Computador/estatística & dados numéricos , Perfilação da Expressão Gênica/estatística & dados numéricos , Neoplasias/diagnóstico , Neoplasias/genética , Bases de Dados Genéticas/estatística & dados numéricos , Estudos de Viabilidade , Lógica Fuzzy , Humanos , Análise de Sequência com Séries de Oligonucleotídeos/estatística & dados numéricos , Reprodutibilidade dos Testes , Incerteza
9.
Entropy (Basel) ; 21(2)2019 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-33266854

RESUMO

For continuous numerical data sets, neighborhood rough sets-based attribute reduction is an important step for improving classification performance. However, most of the traditional reduction algorithms can only handle finite sets, and yield low accuracy and high cardinality. In this paper, a novel attribute reduction method using Lebesgue and entropy measures in neighborhood rough sets is proposed, which has the ability of dealing with continuous numerical data whilst maintaining the original classification information. First, Fisher score method is employed to eliminate irrelevant attributes to significantly reduce computation complexity for high-dimensional data sets. Then, Lebesgue measure is introduced into neighborhood rough sets to investigate uncertainty measure. In order to analyze the uncertainty and noisy of neighborhood decision systems well, based on Lebesgue and entropy measures, some neighborhood entropy-based uncertainty measures are presented, and by combining algebra view with information view in neighborhood rough sets, a neighborhood roughness joint entropy is developed in neighborhood decision systems. Moreover, some of their properties are derived and the relationships are established, which help to understand the essence of knowledge and the uncertainty of neighborhood decision systems. Finally, a heuristic attribute reduction algorithm is designed to improve the classification performance of large-scale complex data. The experimental results under an instance and several public data sets show that the proposed method is very effective for selecting the most relevant attributes with high classification accuracy.

10.
Entropy (Basel) ; 21(2)2019 Feb 07.
Artigo em Inglês | MEDLINE | ID: mdl-33266871

RESUMO

Attribute reduction as an important preprocessing step for data mining, and has become a hot research topic in rough set theory. Neighborhood rough set theory can overcome the shortcoming that classical rough set theory may lose some useful information in the process of discretization for continuous-valued data sets. In this paper, to improve the classification performance of complex data, a novel attribute reduction method using neighborhood entropy measures, combining algebra view with information view, in neighborhood rough sets is proposed, which has the ability of dealing with continuous data whilst maintaining the classification information of original attributes. First, to efficiently analyze the uncertainty of knowledge in neighborhood rough sets, by combining neighborhood approximate precision with neighborhood entropy, a new average neighborhood entropy, based on the strong complementarity between the algebra definition of attribute significance and the definition of information view, is presented. Then, a concept of decision neighborhood entropy is investigated for handling the uncertainty and noisiness of neighborhood decision systems, which integrates the credibility degree with the coverage degree of neighborhood decision systems to fully reflect the decision ability of attributes. Moreover, some of their properties are derived and the relationships among these measures are established, which helps to understand the essence of knowledge content and the uncertainty of neighborhood decision systems. Finally, a heuristic attribute reduction algorithm is proposed to improve the classification performance of complex data sets. The experimental results under an instance and several public data sets demonstrate that the proposed method is very effective for selecting the most relevant attributes with great classification performance.

11.
Comput Math Methods Med ; 2018: 5490513, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29666661

RESUMO

The selection of feature genes with high recognition ability from the gene expression profiles has gained great significance in biology. However, most of the existing methods have a high time complexity and poor classification performance. Motivated by this, an effective feature selection method, called supervised locally linear embedding and Spearman's rank correlation coefficient (SLLE-SC2), is proposed which is based on the concept of locally linear embedding and correlation coefficient algorithms. Supervised locally linear embedding takes into account class label information and improves the classification performance. Furthermore, Spearman's rank correlation coefficient is used to remove the coexpression genes. The experiment results obtained on four public tumor microarray datasets illustrate that our method is valid and feasible.


Assuntos
Biologia Computacional , Perfilação da Expressão Gênica , Modelos Lineares , Neoplasias/genética , Análise de Sequência com Séries de Oligonucleotídeos , Análise Serial de Tecidos , Algoritmos , Teorema de Bayes , Interpretação Estatística de Dados , Bases de Dados Genéticas , Reações Falso-Positivas , Humanos , Reprodutibilidade dos Testes , Software
12.
Bioengineered ; 9(1): 144-151, 2018 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-29161975

RESUMO

In recent years, tumor classification based on gene expression profiles has drawn great attention, and related research results have been widely applied to the clinical diagnosis of major gene diseases. These studies are of tremendous importance for accurate cancer diagnosis and subtype recognition. However, the microarray data of gene expression profiles have small samples, high dimensionality, large noise and data redundancy. To further improve the classification performance of microarray data, a gene selection approach based on the Fisher linear discriminant (FLD) and the neighborhood rough set (NRS) is proposed. First, the FLD method is employed to reduce the preliminarily genetic data to obtain features with a strong classification ability, which can form a candidate gene subset. Then, neighborhood precision and neighborhood roughness are defined in a neighborhood decision system, and the calculation approaches for neighborhood dependency and the significance of an attribute are given. A reduction model of neighborhood decision systems is presented. Thus, a gene selection algorithm based on FLD and NRS is proposed. Finally, four public gene datasets are used in the simulation experiments. Experimental results under the SVM classifier demonstrate that the proposed algorithm is effective, and it can select a smaller and more well-classified gene subset, as well as obtain better classification performance.


Assuntos
Neoplasias do Colo/genética , Regulação Neoplásica da Expressão Gênica , Genes Neoplásicos , Leucemia/genética , Neoplasias Pulmonares/genética , Neoplasias da Próstata/genética , Algoritmos , Análise por Conglomerados , Neoplasias do Colo/diagnóstico , Neoplasias do Colo/patologia , Biologia Computacional , Bases de Dados Genéticas , Conjuntos de Dados como Assunto , Análise Discriminante , Feminino , Perfilação da Expressão Gênica , Humanos , Leucemia/diagnóstico , Leucemia/patologia , Neoplasias Pulmonares/diagnóstico , Neoplasias Pulmonares/patologia , Masculino , Análise em Microsséries , Redução Dimensional com Múltiplos Fatores , Neoplasias da Próstata/diagnóstico , Neoplasias da Próstata/patologia
13.
Biomed Mater Eng ; 26 Suppl 1: S1863-9, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26405958

RESUMO

Fuzzy clustering is an important tool for analyzing microarray data. A major problem in applying fuzzy clustering method to microarray gene expression data is the choice of parameters with cluster number and centers. This paper proposes a new approach to fuzzy kernel clustering analysis (FKCA) that identifies desired cluster number and obtains more steady results for gene expression data. First of all, to optimize characteristic differences and estimate optimal cluster number, Gaussian kernel function is introduced to improve spectrum analysis method (SAM). By combining subtractive clustering with max-min distance mean, maximum distance method (MDM) is proposed to determine cluster centers. Then, the corresponding steps of improved SAM (ISAM) and MDM are given respectively, whose superiority and stability are illustrated through performing experimental comparisons on gene expression data. Finally, by introducing ISAM and MDM into FKCA, an effective improved FKCA algorithm is proposed. Experimental results from public gene expression data and UCI database show that the proposed algorithms are feasible for cluster analysis, and the clustering accuracy is higher than the other related clustering algorithms.


Assuntos
Lógica Fuzzy , Perfilação da Expressão Gênica/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Reconhecimento Automatizado de Padrão/métodos , Proteoma/metabolismo , Transdução de Sinais/fisiologia , Algoritmos , Animais , Simulação por Computador , Humanos , Modelos Biológicos , Modelos Estatísticos , Família Multigênica/fisiologia
14.
Biomed Mater Eng ; 26 Suppl 1: S1953-9, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26405969

RESUMO

In view of the characteristics of high dimension, small samples, nonlinearity and numeric type in the gene expression profile data, the logistic and the correlation information entropy are introduced into the feature gene selection. At first, the gene variable is screened preliminarily by logistic regression to obtain the genes that have a greater impact on the classification; then, the candidate features set is generated by deleting the unrelated features using Relief algorithm. On the basis of this, delete redundant features by using the correlation information entropy; finally, the feature gene subset is classified by using the classifier of support vector machine (SVM). Experimental results show that the proposed method can obtain smaller subset of genes and achieve higher recognition rate.


Assuntos
Perfilação da Expressão Gênica/métodos , Modelos Lineares , Proteínas de Neoplasias/metabolismo , Neoplasias/metabolismo , Reconhecimento Automatizado de Padrão/métodos , Simulação por Computador , Entropia , Humanos , Análise de Regressão , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Estatística como Assunto
15.
Biomed Mater Eng ; 26 Suppl 1: S2011-7, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26405977

RESUMO

One of the important problems in microarray gene expression data is tumor classification. This paper proposes a new feature selection method for tumor classification using gene expression data. In this method, three dimensionality reduction methods, including principal component analysis (PCA), factor analysis (FA) and independent component analysis (ICA), are first introduced to extract and select features for tumor classification, and their corresponding specific steps are given respectively. Then, the superiority of three algorithms is demonstrated by performing experimental comparisons on acute leukemia data sets. It is concluded that PCA compared with FA and ICA is the best under feature load ratio. However, PCA cannot make full use of the category information. To overcome the weak point, Fisher linear discriminant (FLD) is employed as those components of PCA, and a new approach to principal component discriminant analysis (PCDA) is proposed to retain all assets and work better than both PCA and FLD for classification. The further experimental results show that the classification ability of selected feature subsets by means of PCDA is higher than that of the other related dimensionality reduction methods, and the proposed algorithm is efficient and feasible for tumor classification.


Assuntos
Leucemia/classificação , Análise de Componente Principal , Algoritmos , Análise Discriminante , Humanos
16.
PLoS One ; 9(4): e94868, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24743545

RESUMO

The recovery of liver mass is mainly mediated by proliferation of hepatocytes after 2/3 partial hepatectomy (PH) in rats. Studying the gene expression profiles of hepatocytes after 2/3 PH will be helpful to investigate the molecular mechanisms of liver regeneration (LR). We report here the first application of weighted gene co-expression network analysis (WGCNA) to analyze the biological implications of gene expression changes associated with LR. WGCNA identifies 12 specific gene modules and some hub genes from hepatocytes genome-scale microarray data in rat LR. The results suggest that upregulated MCM5 may promote hepatocytes proliferation during LR; BCL3 may play an important role by activating or inhibiting NF-kB pathway; MAPK9 may play a permissible role in DNA replication by p38 MAPK inactivation in hepatocytes proliferation stage. Thus, WGCNA can provide novel insight into understanding the molecular mechanisms of LR.


Assuntos
Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Hepatectomia , Hepatócitos/metabolismo , Regeneração Hepática/genética , Animais , Hepatócitos/citologia , Anotação de Sequência Molecular , Ratos , Ratos Sprague-Dawley , Transdução de Sinais/genética
17.
Biomed Mater Eng ; 24(1): 763-70, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24211962

RESUMO

Feature selection is a key problem in tumor classification and related tasks. This paper presents a tumor classification approach with neighborhood rough set-based feature selection. First, some uncertainty measures such as neighborhood entropy, conditional neighborhood entropy, neighborhood mutual information and neighborhood conditional mutual information, are introduced to evaluate the relevance between genes and related decision in neighborhood rough set. Then some important properties and propositions of these measures are investigated, and the relationships among these measures are established as well. By using improved minimal-Redundancy-Maximal-Relevancy, combined with sequential forward greedy search strategy, a novel feature selection algorithm with low time complexity is proposed. Finally, several cancer classification tasks are demonstrated using the proposed approach. Experimental results show that the proposed algorithm is efficient and effective.


Assuntos
Perfilação da Expressão Gênica , Neoplasias/classificação , Neoplasias/diagnóstico , Algoritmos , Biologia Computacional , Computadores , Bases de Dados Factuais , Humanos , Neoplasias/genética , Valor Preditivo dos Testes , Linguagens de Programação , Reprodutibilidade dos Testes , Máquina de Vetores de Suporte , Incerteza
18.
Biomed Mater Eng ; 24(1): 1001-8, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24211990

RESUMO

Correlation-based feature selection (CFS) using neighborhood mutual information (NMI) and particle swarm optimization (PSO) are combined into an ensemble technique in this paper. Based on this observation, an efficient gene selection algorithm, denoted by NMICFS-PSO, is proposed. Several cancer recognition tasks are gathered for testing the proposed technique. Moreover, support vector machine (SVM), integrated with leave-one-out cross-validation and served as a classifier, is employed for six classification profiles to calculate the classification accuracy. Experimental results show that the proposed method can reduce the redundant features effectively and achieve superior performance. The classification accuracy obtained by our method is higher in five out of the six gene expression problems as compared with that of other classifi cation methods.


Assuntos
Neoplasias da Mama/genética , Perfilação da Expressão Gênica , Leucemia/genética , Algoritmos , Neoplasias da Mama/diagnóstico , Feminino , Humanos , Leucemia/diagnóstico , Reprodutibilidade dos Testes , Software , Processos Estocásticos , Máquina de Vetores de Suporte
19.
Biomed Mater Eng ; 24(1): 1307-14, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24212026

RESUMO

Gene selection is a key step in performing cancer classification with DNA microarrays. The challenges from high dimension and small sample size of microarray dataset still exist. On rough set theory applied to gene selection, many algorithms have been presented, but most are time-consuming. In this paper, a granular computing-based gene selection as a new method is proposed. First, some granular computing-based concepts are introduced and then some of their important properties are derived. The relationship between positive region-based reduct and granular space-based reduct is discussed. Then, a significance measure of feature is proposed to improve the efficiency and decrease the complexity of classical algorithm. By using Hashtable and input sequence techniques, a fast heuristic algorithm is constructed for the better computational efficiency of gene selection for cancer classification. Extensive experiments are conducted on five public gene expression data sets and seven data sets from UCI respectively. The experimental results confirm the efficiency and effectiveness of the proposed algorithm.


Assuntos
Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Neoplasias/genética , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Algoritmos , Inteligência Artificial , Bases de Dados Factuais , Humanos , Neoplasias/classificação , Neoplasias/diagnóstico , Reprodutibilidade dos Testes , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...